-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revise testing infrastructure to decrease spurious failures #759
base: master
Are you sure you want to change the base?
Conversation
@eitsupi I think all the tests in https://github.com/rocker-org/rocker-versioned2/actions/runs/7721419467/workflow?pr=759 run on the same runner? public repo runners for linux have 150 GB. it looks like just most small change triggers the full matrix of tests, and at a the moment that full matrix of tests just can't actually run on the runner, as there isn't space? e.g in this initial test PR which shows 6 of the 38 tests failing. They fail for various reasons, though none are related to the change in the PR, they are either network issues or lack of space. I think your plans in #755 to better leverage cache in the redesign build infrastructure will greatly improve this situation. But meanwhile I think it's difficult to make a meaningful PR against the repo that won't hit failed tests for unrelated issues, especially the disk space error. I will try testing out some options here for a potentially slimmer test matrix here as a work-around... |
As I commented elsewhere, the capacity issue should be resolved by removing unnecessary software. rocker-versioned2/.github/workflows/reports.yml Lines 49 to 51 in 26c50e5
I think this is affected by a change made a while ago that increased the I think reducing tests and merging incorrect changes is just as bad an idea as ignoring and merging tests that randomly fail. |
@eitsupi Thanks for your help here. To be clear, we are entirely on the same page about not reducing testing and not merging incorrect changes. Having unrelated tests fail for unrelated reasons does not reduce incorrect changes. The current testing does not cover most cases anyway, and I'd like to actually add more tests to get better coverage, not less. As I said at the top of this, I'm not seeking to remove tests overall, I'm testing the removal of tests here to try and get a handle on disk use so I can add tests. We are on the same objective here. I cannot currently fix things that are broken and have been broken for a long time and have never been covered by our tests while PRs are throwing errors that are entirely unrelated to those changes and are also not reflective of problems either in the existing stack or the proposed changes. I don't understand the solution you are proposing -- that we shrink the size of the images themselves or that we free space from other software on the host runner? You suggested removing arrow libraries I think? As I noted in #756, adding support for tensorflow and torch libraries adds 4 - 5 GB each. If you are aware of unnecessary software that could free enough extra space to test ML libraries then a PR would be awesome. The test images should have 150 GB of disk. We should be able to run tests on the 13 GB base cuda image. While the matrix setup is nice, perhaps it would avoid these issues to have tests handled on different runners? The current test design is not compatible with testing the large images involved in the machine learning stack. I suggest we move that testing to a separate runner. We may want it on a separate runner anyway so we have the option of self-hosted runner setup to run these images on GPU. |
Sorry I didn't explain better, but my point was that we can free up space by deleting unnecessary stuff on the runner, which is what Apache Arrow's main repository does thoroughly, and I think the script below does just that. If I remember correctly, all testing is done on a separate runner, so the reason why the test for the ML image fails is simply because that image is too huge. |
daily is already being built daily, and it has been failing for a week anyway. It should not be tested on PRs that only touch unconnected parts of the stack
In the above edits, I have moved cuda images out of the tests/rocker_scripts/matrix.json, since that runner simply does not have enough space to test anything involving the now 13 GB cuda base image, let alone the potentially larger derivative images. I've moved cuda into a separate workflow. I initially structured this off the same design as the rocker_scripts test, but even with a single test there it doesn't have enough space. I don't really understand why -- maybe buildkit is using additional space? I rewrote the action to concise simple @eitsupi It would be great if you'd like to add those image-size-reducing changes from arrow, I'm reluctant to paste them in here as it adds complexity to the build system and I don't really understand what it is doing. (For instance, I don't see how it can really free 20 GB from every ubuntu 22.04 runner, when according to the GtiHub docs runners on private repos don't even have 20 GB SSDs to begin with...) Anyway, I'd like to move forward on this so we can get testing unstuck for #756 and so we can start providing the main ML frameworks on our ML-tagged images. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into this.
Please update the PR title and discription.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason why we should keep this in a separate yml file?
Is splitting another job not enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, not entirely sure I follow. I think having a separate script is necessary to run on another runner? I find this architectural design easier to understand and maintain -- it's fewer pieces and distinct parts are isolated in different files. By splitting another job you mean doing this through the matrix.json file? I did look at the design of a matrix.json for this (in a separate yaml file, though perhaps could be merged): https://github.com/rocker-org/rocker-versioned2/pull/759/files#diff-454bb856f3821991ff2015ec5ba81c69df2ae3b5a476e30104d697c420b35093, and it hits the same issue with space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion is simply that it is impossible to move the job to scripts-test.yml.
I don't know if they're sharing capacity on a per-job basis or on a per-workflow basis, but I think it's unlikely that they're sharing it across workflows.
So you probably don't need to split the workflow for a single job here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh sorry to be dense, I think I understand. I've always tended to write actions with basically one job per yaml, unless there were two dependent jobs (e.g. like generate_matrix
and build
scripts-test.yml). I think I do the same thing with, e.g. writing R functions in many different files in R/
. I see the cuda tests as a bit of a work in progress, while for the moment scripts-test.yml
has been pretty stable, so conceptually to me anyway this provides a bit more separation between the component I'm still intending to fiddle with and the component that works. Why have all the jobs in the same yaml file?
.github/workflows/cuda-test.yml
Outdated
on: | ||
workflow_dispatch: null | ||
push: | ||
paths: | ||
- tests/ml-test.Dockerfile | ||
- .github/workflows/cuda-test.yml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please reconsider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
walk me through your thinking? if we change the workflow or the test, we should run the workflow, right? or is the former implicit anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like this:
rocker-versioned2/.github/workflows/scripts-test.yml
Lines 3 to 14 in 26c50e5
on: | |
pull_request: | |
branches: | |
- master | |
paths: | |
- tests/rocker_scripts/Dockerfile | |
- tests/rocker_scripts/matrix.json | |
- tests/rocker_scripts/test.sh | |
- scripts/*.sh | |
- "!scripts/install_R_*.sh" | |
- "!scripts/setup_R.sh" | |
workflow_dispatch: |
- I don't think
push
should be used without specifying a tag or branch. - The actual target we want to test is the PRs, not pushes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh thanks for explaining, got it! done now.
.github/workflows/cuda-test.yml
Outdated
jobs: | ||
build: | ||
runs-on: ubuntu-latest | ||
permissions: write-all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
write-all
is needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good question, let's try without that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry. What I was trying to say was that read permission would be sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I don't think this test is ever using the GITHUB_TOKEN here anyway? I think I had left write-all in by mistake from a previous action that was also pushing to github (and even then it should plausibly have a more scoped write permission). I think the test is happy without this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this repository is old, I assume write-all privileges are granted by default, but if you follow best practices, you should grant only minimal privileges.
Since we are only reading files here, I expect read permission to be sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh good to know! I didn't realize they did legacy privileges but of course that makes sense or actions would break. I've gone with read-all now.
@@ -33,7 +32,7 @@ | |||
"base_image": "rocker/r-ver", | |||
"tag": "devel", | |||
"script_name": "install_rstudio.sh", | |||
"script_arg": "daily" | |||
"script_arg": "latest" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems unrelated change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you know, daily has been failing for a week due the RStudio server throwing a 500 error on that download. Because we try to build that image daily in the cron tasks anyway, I think it's less important to also test it here, especially when it means that the current situation ends up blocking the ability to fix anything. Maybe you can provide your preferred solution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I'm sorry. I didn't know that the daily build was failing. Indeed, as you say, this is used every day, so we don't have to dare to test it here.
I'm sure that "latest"
was tested elsewhere (because latest is the default), so I think it's okay to delete the test itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok thanks. yeah sorry I get the emails every day when the daily fails (https://github.com/rocker-org/rocker-versioned2/actions/workflows/devel.yml) which happens all the time. It looked like this test was building rstudio on r-ver:devel while other tests were building it on the normal r-ver so I didn't delete it, but if you're fine dropping this test then so am I!
The testing infrastructure fails to successfully test any cuda images due to disk space limitations. This separates out the testing of cuda images from other scripts.
Tests involving rstudio-daily are also failing continuously due to server issues with the downloads.
Both of these create conditions that guarantee test failure, making it impossible for PRs to satisfy all checks. This either prevents PR contributions or means that PR contributions are merged with some failing tests, neither of which is a good solution.